Whether Mfcc or Gfcc Is Better for Recognizing Emotion from Speech? a Study
نویسنده
چکیده
A major challenge for automatic speech recognition (ASR) relates to significant performance reduction in noisy environments. Recently, the study of the emotional content of speech signals got more importance and hence, many systems have been proposed to identify the emotional content of a spoken utterance. The important aspects of the design of a speech emotion recognition system are pre-processing, feature extraction, training and classification, recognition. Typically, extracted speaker features are short-time cepstral coefficients such as Melfrequency cepstral coefficients (MFCCs) and perceptual linear predictive (PLP) coefficients, or long-term features such as prosody. Such systems usually do not perform well under noisy conditions because extracted features are distorted by noise, causing mismatched likelihood calculation. By introducing a novel speaker feature, gammatone frequency cepstral coefficient (GFCC), based on an auditory periphery model, and show that this feature captures speaker characteristics and performs substantially better than conventional speaker features under noisy conditions. An important finding in the study is that GFCC features outperform conventional MFCC features under noisy conditions..
منابع مشابه
Speech Emotion Recognition using GFCC and BPNN
From the past years, researchers have showed a very interest in the speech recognition systems based on the emotions. Mainly the research has been done with the aim to bring closer both human and computer with each other by recognizing mood swings. In the recognizing process we must know how to represent the emotions on the basis of some features like sad, happy etc. When the verbal content is ...
متن کاملFeature Extraction techniques for Classification of Emotions in Speech Signals
Automatic speech emotion recognition is a process of recognizing emotions in speech. This has wide applications in the area of phsycatrics help and in robotics’he human computer interaction the challenging area of research. Any effective HCI system has two sections Training and testing. The techniques used in the system are feature extraction and classification. This paper focuses on the brief ...
متن کاملHindi vowel classification using GFCC and formant analysis in sensor mismatch condition
In the presence of noise and sensor mismatch condition performance of a conventional automatic Hindi speech recognizer starts to degrade, while we human being are able to segregate, focus and recognize the target speech. In this paper, we have used auditory based feature extraction procedure Gammatone frequency cepstral coefficient (GFCC) for Hindi phoneme classification. To distinguish vowels ...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملAcoustic Emotion Recognition Using Linear and Nonlinear Cepstral Coefficients
Recognizing human emotions through vocal channel has gained increased attention recently. In this paper, we study how used features, and classifiers impact recognition accuracy of emotions present in speech. Four emotional states are considered for classification of emotions from speech in this work. For this aim, features are extracted from audio characteristics of emotional speech using Linea...
متن کامل